Explaining black box models through counterfactuals

Patrick Altmeyer

Overview

  • Motivation
  • Methodological background
  • Examples in CounterfactualExplanations.jl
  • Possible research questions

Motivation

An unfortunate reality

  • From human to data-driven decision-making:
    • Today, it is more likely than not that your digital loan or employment application will be handled by an algorithm, at least in the first instance.
  • Black-box models create undesirable dynamics:
    • Human operators in charge of the system have to rely on it blindy.
    • Those indviduals subject to it generally have no way to challenge an outcome.

“You cannot appeal to (algorithms). They do not listen. Nor do they bend.”

— Cathy O’Neil in Weapons of Math Destruction, 2016

Explainable AI (xAI)

  • interpretable = inherently interpretable model, no extra tools needed
  • explainable = inherently not interpretable model, but explainable through xAI

Ad-hoc interpretability:

  • Just use interpretable models 😠 ! (GLM, decision trees, rules, …) (Rudin 2019)
  • Proxy methods construct simple representations of complex models

Post-hoc explainability:

  • Local surrogate explainers like LIME and Shapley: useful and popular, but can be easily fooled (Slack et al. 2020)
  • Counterfactual explanations explain how inputs into a system need to change for it to produce different decisions.
  • Realistic and actionable changes can be used for the purpose of algorithmic recourse.

From 🐱 to 🐶

We have fitted some black box classifier to divide cats and dogs. One 🐱 is friends with a lot of cool 🐶 and wants to remain part of that group. The counterfactual path below shows her how to fool the classifier:

On a more serious note …

  • Ever received an automated rejection email? How was the feedback? 😕
  • What about loan applications? 😔
  • Algorithms to evaluate redivisim risk (COMPAS)? 😓

Methodology

Not so fast …

Effective counterfactuals should meet certain criteria ✅

  • closeness: the average distance between factual and counterfactual features should be small (Wachter, Mittelstadt, and Russell (2017))
  • actionability: the proposed feature perturbation should actually be actionable (Ustun, Spangher, and Liu (2019), Poyiadzi et al. (2020))
  • plausibility: the counterfactual explanation should be plausible to a human (Joshi et al. (2019))
  • unambiguity: a human should have no trouble assigning a label to the counterfactual (Schut et al. (2021))
  • sparsity: the counterfactual explanation should involve as few individual feature changes as possible (Schut et al. (2021))
  • robustness: the counterfactual explanation should be robust to domain and model shifts (Upadhyay, Joshi, and Lakkaraju (2021))
  • diversity: ideally multiple diverse counterfactual explanations should be provided (Mothilal, Sharma, and Tan (2020))
  • causality: counterfactual explanations reflect the structual causal model underlying the data generating process (Karimi et al. (2020),Karimi, Schölkopf, and Valera (2021))

The Bayesian approach - a catchall?

  • Schut et al. (2021) note that different approaches just work with different complexity functions (\(h(\underline{x})\) in Equation 1)
  • They show that for classifiers \(\mathcal{\widetilde{M}}\) that incoporate predictive uncertainty we can drop the complexity penalty altogether:

\[ \underline{x} = \arg \min_{\underline{x}} \ell(M(\underline{x}),t) \ \ , \ \ \forall M\in\mathcal{\widetilde{M}} \qquad(3)\]

Examples in CounterfactualExplanations.jl

Why Julia?

Fast, transparent, beautiful 🔴🟢🟣

Dev Build Status Coverage

CounterfactualExplanations.jl is a package for generating counterfactual explanations and aglorithmic recourse.

Installation

using Pkg
Pkg.add("CounterfactualExplanations")
  • To be submitted to upcoming JuliaCon 2022, if accepted then publish in proceedings.
  • In Python 🐍, use CARLA (Pawelczyk et al. 2021).

Usage example

Using the package, generating counterfactuals is as easy as follows:

# Some random example:
w = [1.0 -2.0] # true coefficients
b = [0] # true constant
= [-1,0.5] # factual in class 0
target = 1.0 # target
γ = 0.9 # desired confidence

# Declare model:
using CounterfactualExplanations.Models
𝑴 = LogisticModel(w, b)

# Counterfactual search:
generator = GenericGenerator(
  0.1,0.1,1e-5,:logitbinarycrossentropy,nothing)
recourse = generate_counterfactual(
  generator, x̅, 𝑴, target, γ)

Designed to work with any custom model and generator through multiple dispatch.

Generic search - plugin MLP

Greedy search - deep ensemble

MNIST

This looks nice 🤓

And this … ugh 🥴

Research questions

Dynamics of AR

What happens once AR has actually been implemented? 👀

Endogenous shifts in AR

Other questions

  • To what extent is the effectiveness of CE dependent on the quality of the classifier?
  • Is CE really more intuitive? Could run a user-based study like in Kaur et al. (2020)
  • Counterfactual explanations for time series data?
  • More ideas form your side? 🤗

More resources

References

Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. “Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.” arXiv Preprint arXiv:1907.09615.
Karimi, Amir-Hossein, Bernhard Schölkopf, and Isabel Valera. 2021. “Algorithmic Recourse: From Counterfactual Explanations to Interventions.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 353–62.
Karimi, Amir-Hossein, Julius Von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. “Algorithmic Recourse Under Imperfect Causal Knowledge: A Probabilistic Approach.” arXiv Preprint arXiv:2006.06831.
Kaur, Harmanpreet, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. “Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning.” In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–14.
Mothilal, Ramaravind K, Amit Sharma, and Chenhao Tan. 2020. “Explaining Machine Learning Classifiers Through Diverse Counterfactual Explanations.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 607–17.
Pawelczyk, Martin, Sascha Bielawski, Johannes van den Heuvel, Tobias Richter, and Gjergji Kasneci. 2021. “Carla: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms.” arXiv Preprint arXiv:2108.00783.
Poyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. “FACE: Feasible and Actionable Counterfactual Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–50.
Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15.
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations by Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.
Slack, Dylan, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. “Fooling Lime and Shap: Adversarial Attacks on Post Hoc Explanation Methods.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 180–86.
Upadhyay, Sohini, Shalmali Joshi, and Himabindu Lakkaraju. 2021. “Towards Robust and Reliable Algorithmic Recourse.” arXiv Preprint arXiv:2102.13620.
Ustun, Berk, Alexander Spangher, and Yang Liu. 2019. “Actionable Recourse in Linear Classification.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 10–19.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841.